Method and system for controlling the improving of a program layout

ABSTRACT

A method and system for improving the working set of a program image. The working set (WS) improvement system of the present invention employs a two-phase technique for improving the working set. In the first phase, the WS improvement system inputs the program image and outputs a program image with the locality of its references improved. In the second phase, the WS improvement system inputs the program image with its locality of references improved and outputs a program image with the placement of its basic blocks in relation to page boundaries improved so that the working set is reduced.

RELATED APPLICATIONS

[0001] This patent application is related to U.S. patent applicationSer. No. ______, entitled “Method and System for Improving the Layout ofa Program Image Using Clustering” and U.S. patent application Ser. No.______,entitled “Method and System for Incrementally Improving a ProgramLayout,” which are being filed concurrently and are hereby incorporatedby reference.

TECHNICAL FIELD

[0002] This invention relates to a method and system for optimizing acomputer program image and, more particularly, to a method and systemfor rearranging code portions of the program image to reduce the workingset.

BACKGROUND OF THE INVENTION

[0003] Many conventional computer systems utilize virtual memory.Virtual memory provides a logical address space that is typically largerthan the corresponding physical address space of the computer system.One of the primary benefits of using virtual memory is that itfacilitates the execution of a program without the need for all of theprogram to be resident in main memory during execution. Rather, certainportions of the program may reside in secondary memory for part of theexecution of the program. A common technique for implementing virtualmemory is paging; a less popular technique is segmentation. Because mostconventional computer systems utilize paging instead of segmentation,the following discussion refers to a paging system, by these techniquescan be applied to segmentation systems or systems employing paging andsegmentation as well.

[0004] When paging is used, the logical address space is divided into anumber of fixed-size blocks, known as pages. The physical address spaceis divided into like-sized blocks, known as page frames. A pagingmechanism maps the pages from the logical address space, for example,secondary memory, into the page frames of the physical address space,for example, main memory. When the computer system attempts to referencean address on a page that is not present in main memory, a page faultoccurs. After a page fault occurs, the operating system copies the pageinto main memory from secondary memory and then restarts the instructionthat caused the fault.

[0005] One paging model that is commonly used to evaluate theperformance of paging is the working set model. At any instance in time,t, there exists a working set, w(k, t), consisting of all the pages usedby the k most recent memory references. The operating system monitorsthe working set of each process and allocates each process enough pageframes to contain the process' working set. If the working set is largerthan the number of allocated page frames, the system will be prone tothrashing. Thrashing refers to very high paging activity in which pagesare regularly being swapped from secondary memory into the pages framesallocated to a process. This behavior has a very high time andcomputational overhead. It is therefore desirable to reduce the size of(ie., the number of pages in) a program's working set to lessen thelikelihood of thrashing and significantly improve system performance.

[0006] A programmer typically writes source code without any concern forhow the code will be divided into pages when it is executed. Similarly,a compiler program translates the source code into relocatable machineinstructions and stores the instructions as object code in the order inwhich the compiler encounters the instructions in the source code. Theobject code therefore reflects the lack of concern for the placementorder by the programmer. A linker program then merges related objectcode together to produce executable code. Again, the linker program hasno knowledge or concern for the working set of the resultant executablecode. The linker program merely orders the instructions within theexecutable code in the order in which the instructions are encounteredin the object code. The computer program and linker program do not havethe information required to make a placement of code within anexecutable module to reduce the working set. The information requiredcan in general only be obtained by actually executing the executablemodule and observing its usage. Clearly this cannot be done before theexecutable module has been created. The executable module initiallycreated by the compiler and linker thus is laid out without regard toany usage pattern.

[0007] As each portion of code is executed, the page in which it residesmust be in physical memory. Other code portions residing on the samepage will also be in memory, even if they may not be executed intemporal proximity. The result is a collection of pages in memory withsome required code portions and some unrequired code portions. To theextent that unrequired code portions are loaded into memory, valuablememory space may be wasted, and the total number of pages loaded intomemory may be much larger than necessary.

[0008] To make a determination as to which code portions are “required”and which code portions are “unrequited,” a developer needs executioninformation for each code portion, such as when the code portion isaccessed during execution of the computer program. A common method forgathering such execution information includes adding instrumentationcode to every basic block of a program image. A basic block is a portionof code such that if one instruction of the basic block is executed thenevery instruction is also executed. The execution of the computerprogram is divided into a series of time intervals (e.g., 500milliseconds). Each time a basic block is executed during execution ofthe computer program, the instrumentation code causes a flag to be setfor that basic block for the current time interval. Thus, afterexecution of the computer program, each basic block will have a temporalusage vector (“usage vector”) associated with it. The usage vector for abasic block has, for each time interval, a bit that indicates whetherthat basic block was executed during that time interval. The usagevectors therefore reflect the temporal usage pattern of the basicblocks.

[0009] After the temporal usage patterns have been measured, a pagingoptimizer can rearrange the basic blocks to minimize the working set. Inlo particular, basic blocks with similar temporal usage patterns can bestored on the same page. Thus, when a page is loaded into main memory,it contains basic blocks that are likely to be required.

[0010] The minimization of the working set is an NP-complete problem,that is, no polynomial-time algorithm is known for solving the problem.Thus, the time needed to minimize the working set of a program imagegenerally increases exponentially as the number of code portionsincrease (i.e., O(e^(n)), where n is the number of code portions).Because complex program images can have thousands, and even hundreds ofthousands, of code portions, such an algorithm cannot generate a minimumworking set in a timely manner even when the most powerful computers areemployed. Because the use of such algorithms are impractical for all butthe smallest program images, various algorithms are needed to generate alayout that results in an improved working set (albeit not necessarilythe minimal working set) in a timely manner.

SUMMARY OF THE INVENTION

[0011] The present invention provides a method and system for improvingthe working set of a program image. The working set (WS) improvementsystem of the present invention employs a two-phase technique forimproving the working set. In the first phase, the WS improvement systeminputs the program image and outputs a program image with the localityof its references improved. In the second phase, the WS improvementsystem inputs the program image with its locality of references improvedand outputs a program image with the placement of its basic blocks inrelation to page boundaries improved so that the working set is reduced.

[0012] The present invention provides a technique for evaluating thelocality of references for a layout of a computer program. The techniquecalculates a metric value indicating a working set size of the layoutwhen the layout is positioned to start at various different memorylocations within a page. This technique then combines the calculatedmetric values as an indication of the locality of references of thelayout of the computer program. By combining the calculated metricvalues, the effect of page boundaries on the working set size isaveraged and the combined metric value represents the effects of thelocality of is references or the working set size.

[0013] The present invention provides a technique for estimating therate of improvement in the working set for a plurality of incrementallyimproved layouts of a computer program. The technique estimates thechange in working set size from one incrementally improved layout to thenext incrementally improved layout and estimates the time needed toincrementally improve the layout. The technique then combines theestimated change in working set size with the estimated time needed toincrementally improve the working set for that layout to estimate therate of improvement. By separately estimating the change in working setsize and the time needed to incrementally improve the working set,different estimation techniques that are appropriate to the data beingestimated can be used.

[0014] The present invention provides a technique for identifyingcoefficients for a filter for filtering results of a function. Thetechnique collects sample input values to the filter and identifiesdesired output values from the filter for the collected sample inputvalues. The technique then generates a power spectrum of the collectedsample input values and a power spectrum of the identified desiredoutput values. The technique then calculates the difference between thegenerated power spectra. Finally, the technique identifies coefficientsthat yield a filter transfer function that closely approximates thecalculated differences. The present invention also provides a techniquefor identifying coefficients for a finite impulse response filter. Thetechnique collects sample input values for a function and identifiesdesired output values for the filter for the collected sample inputvalues. The technique then approximates the output values from the inputvalues using a linear fitting technique. Finally, the technique sets thecoefficients to values obtained from the linear-fitting technique. Whenthe input and output values represent the rate of change in working setsize resulting from sample runs of the WS improvement system, then thefilter can be used to estimate the rate of change dynamically as theimprovement process proceeds.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015]FIG. 1 is a high-level block diagram of the computing environmentin which various aspects of the present invention may be implemented.

[0016]FIG. 2 is a high-level flow diagram of an implementation of aroutine to improve the locality of references of a layout of a programimage.

[0017]FIG. 3 is a high-level flow diagram of a routine to improve theordering of the basic blocks of a layout of a program image relative topage boundaries.

[0018]FIG. 4 is a flow diagram of an implementation of a routine toselect an initial anchor basic block for the slinky algorithm.

[0019]FIG. 5 is a flow diagram of an implementation to find the basicblock with the lowest metric value.

[0020]FIG. 6 illustrates the calculation of the LOR metric value.

[0021]FIG. 7 is a flow diagram of an implementation of a routine tocalculate the LOR metric value.

[0022]FIG. 8 is a flow diagram of an implementation of a routine toselect the number of layouts that should be generated and evaluated.

[0023]FIG. 9 is a flow diagram of a routine to evaluate the statisticalrelationships.

[0024]FIG. 10A is a graph of the WS metric values as a function of timefor four generated layouts that have been incrementally improved.

[0025]FIG. 10B is a graph of the average WS metric values for variousnumbers of layouts.

[0026]FIG. 10C is a graph of the marginal reduction in the WS metricvalue as a function of time.

[0027]FIG. 11 is a block diagram illustrating the steps for separatelycalculating the rate of improvement per step and the time per step.

[0028]FIG. 12A is a graph of the WS metric value versus time for theincremental improvement process of a sample layout.

[0029]FIG. 12B is a graph of the improvement in the WS metric value foreach time interval during the incremental improvement process.

[0030]FIG. 12C is a graph of the WS metric value versus step number.

[0031]FIG. 12D is a graph of the improvement in the WS metric value foreach step.

[0032]FIG. 12E is a graph of the processing time per step.

[0033]FIG. 13 illustrates the defined rate of improvement for a streamof WS metric values.

[0034]FIG. 14 illustrates the defined rate of improvement andinstantaneous rate of improvement.

[0035]FIG. 15 is a flow diagram of a routine to generate the definedrate of improvement for a stream of known WS metric values.

[0036]FIG. 16 illustrates the power spectra.

[0037]FIG. 17 illustrates the offset differences in the power spectra.

[0038]FIG. 18 is a flow diagram of a routine to generate the filtercoefficients using the frequency-domain analysis.

[0039]FIG. 19 illustrates the instantaneous rate of improvement and theto actual defined rate of improvement for a sample run.

[0040]FIG. 20 is a flow diagram of a routine to generate thecoefficients of the filter using time-domain analysis.

[0041]FIG. 21 is a flow diagram of a routine to collect samples forgenerating coefficients for a filter.

[0042]FIG. 22 is a flow diagram of a routine that evaluates a set of ARcoefficients.

DETAILED DESCRIPTION OF THE INVENTION

[0043] I. Overview

[0044] The present invention provides a method and system for improvingthe working set of a program image. The working set (WS) improvementsystem of the present invention employs a two-phase technique forimproving the working set. In the first phase, the WS improvement systeminputs the program image and outputs a program image with the localityof its references improved. In the second phase, the WS improvementsystem inputs the program image with its locality of references improvedand outputs a program image with the placement of its basic blocks inrelation to page boundaries improved so that the working set is reduced.

[0045] In the first phase, the WS improvement system generates variousdifferent layouts of the program image. The WS improvement system uses alocality of reference (LOR) metric function to evaluate the locality ofthe references of each layout. The WS improvement system then selectsthe layout with the best locality of references, as indicated by the LORmetric function, to process in the second phase. The present inventionprovides a layout number selection technique by which the number of thedifferent layouts that are generated can be selected to balance thetrade-off between the computational resources needed to generateadditional layouts and the expected improvement in the resulting workingset if the additional layouts are generated. In particular, the layoutnumber selection technique for selecting the number of different layoutanalyzes the results of using the WS improvement system to improve theworking set of various sample program images. The technique uses the LORmetric function to evaluate the locality of references of the layoutsoutput by the first phase and uses a working set (WS) metric function toevaluate the working set of the layout output by the second phase. Thetechnique correlates the metric values for the locality of references tothe metric values for the working set. Based on this correlation, thelayout number selection technique selects a number of layouts such that,if one more layout were to be generated, the computational expense ofgenerating and evaluating that additional layout would not be worth theexpected resulting improvement in the working set.

[0046] In the second phase, the WS improvement system incrementallyimproves the layout output by the first phase. The WS improvement systemrepeatedly modifies the layout of the program image to improve itsworking set. The WS improvement system uses the WS metric function toevaluate the working set after each incremental improvement of thelayout. The present invention provides various termination conditionsfor determining when to terminate the incremental improvements of thelayout. In one termination condition, referred to as the rate ofimprovement (ROI) termination condition, if the rate of improvement inthe working set from one incrementally improved layout to the next fallsbelow a threshold rate, then the WS improvement system terminates theincremental improvement of the second phase. The present invention alsoprovides a ROI selection technique for selecting an algorithm tocalculate the rate of improvement in the working set for theincrementally improved layouts.

[0047]FIG. 1 is a high-level block diagram of a computing environment inwhich various aspects of the present invention may be implemented.Although not required, the implementations are described in the generalcontext of computer executable instructions, such as modules, beingexecuted by a personal computer. Generally, program modules includeroutines, programs, objects, components, and data structures thatperform a particular task or manipulate and implement particularabstract data types. Moreover, those skilled in the art will appreciatethat the invention may be practiced with other computer systemconfigurations, including, multiprocessor systems, network PCs,mini-computers, mainframe computers, and similar computers. Theinvention may also be practiced in a distributed computing environmentwhere tasks are performed by remote processing devices that are linkedthrough a communications network. In such an environment, programmodules may be located in both local and remote memory storage devices.

[0048] With reference to FIG. 1, an exemplary system includes a generalpurpose computing device 100 in the form of a conventional personalcomputer that includes a central processing unit 101, a memory 102, andvarious input/output devices 103. The memory includes read-only memoryand random access memory. The personal computer includes storage devicessuch as a magnetic disk, an optical disk, or a CD-ROM. It will beappreciated by those skilled in the art that other types ofcomputer-readable storage device (ie., medium) may be used such asmagnetic cassettes, flash memory cards, digital video disks, Bernoullicartridges, random access memories, and read-only memories.

[0049] A number of different programs may be stored on the storagedevices including an operating system, application programs, and the WSimprovement system. The operating system and WS improvement system areloaded into memory for execution by the central processing unit. The WSimprovement system includes a phase 1 component 106 and a phase 2component 108. The phase 1 component inputs a layout 105 of a programimage and outputs a layout 107 of the program image with the locality ofits references improved. The phase 2 component inputs the layout withthe locality of references improved and outputs a layout 109 of theprogram image with its working set improved.

[0050]FIG. 2 is a high-level flow diagram of an implementation of aroutine to improve the locality of references of a layout of a programimage. This routine is an implementation of the phase 1 component. Theroutine loops generating various layouts of the program image whoselocality of references is to be improved. The number of layouts togenerate is predefined by the layout number selection technique. Theroutine calculates a metric value, referred to as the locality ofreference (LOR) metric value, that rates the various layouts based ontheir locality of references. The routine then returns the generatedlayout having the best locality of references, which is the layout withthe lowest LOR metric value. In step 201, the routine invokes asubroutine to generate a layout for the program image. One algorithm togenerate a layout is described in detail in copending patent applicationentitled “Method and System for Improving the Layout of a Program ImageUsing Clustering.” Because that algorithm has random selection aspects,each time the algorithm is invoked a different layout is typicallygenerated. In particular, when the algorithm determines that variousorderings of basic blocks will have the same effect on improving thelocality of references, the algorithm randomly selects one of theorderings. Because a typical program image may have thousands of basicblocks, the algorithm makes many random selections. Thus, eachinvocation of the subroutine that implements the algorithm is likely togenerate a different layout. One skilled in the art would appreciatethat many other algorithms may be used to generate the various layouts.This routine can be used to select the layout with the best locality ofreferences regardless of how the layouts are generated. In step 202, theroutine invokes a subroutine to calculate the LOR metric value for thegenerated layout. In step 203, if the predefined number of layouts havealready been generated, then the routine continues at step 204, else theroutine loops at step 201 to generate the next layout. In step 204, theroutine selects the generated layout with the lowest LOR referencemetric value and returns that selected layout, which is the output ofphase 1 and the input to phase 2.

[0051]FIG. 3 is a high-level flow diagram of a routine to improve theordering of the basic blocks of a layout of a program image relative topage boundaries. This routine is an implementation of the phase 2component. One embodiment of this routine is described in detail incopending patent application entitled “Method and System forIncrementally Improving a Program Layout.” This routine loops, usingwhat is referred to as a “slinky” algorithm, finding an anchor basicblock and selecting another basic block such that when the basic blocksbetween the anchor basic block and the selected basic block arerearranged the working set of the program image is improved. The routinethen rearranges the basic blocks in the range to improve the workingset. The routine then repeats this process until a termination conditionis satisfied. In steps 303-305, the routine performs the “slinky”algorithm to determine which basic blocks to rearrange. One skilled inthe art would appreciate that various different algorithms can be usedto select different arrangements of the basic blocks. In steps 307-309,the routine determines whether the termination condition is satisfied.If the termination condition is not satisfied, the routine loops toagain incrementally improve the working set.

[0052] II. Detailed Description

[0053] The present invention includes the following four aspects:

[0054] A. an LOR metric function that rates the locality of thereferences of a layout,

[0055] B. a layout number selection technique for selecting the numberof layouts to generate and evaluate when selecting a layout with animproved locality of reference,

[0056] C. various termination conditions, including a rate ofimprovement (ROI) termination condition, for determining when toterminate the incremental improvements of the layout, and

[0057] D. a ROI selection technique for generating an algorithm tocalculate the rate of improvement.

[0058] A. Locality of Reference (LOR) Metric Function

[0059] Phase 1 generates the various layouts preferably using the greedyagglomerative clustering technique as described in copending application“Method and System for Improving the Layout of a Program Image UsingClustering.” Phase 1 could employ several different techniques to selecta layout as input for Phase 2. The different techniques attempt topredict which layout will result in the best working set when processedby phase 2. The WS improvement system could rate such layouts byemploying the WS metric function, which indicates the size of theworking set. However, empirical analysis has shown a low correlationbetween the size of the working set of the layout input to phase 2, andthe size of the working set of the layout output by phase 2. The reasonsfor this low correlation may be due to accidental properties of theinput layout that are not preserved through the incremental improvementprocess. Since any input layout will have some arbitrary degree of pagepositioning, this effect will be measured by the WS metric function.Thus, an input layout that happens to have a relatively good temporalusage pattern will have a WS metric value that is lower than otherlayouts that have a better overall locality of references.

[0060] Rather than using the WS metric function, the WS improvementsystem evaluates the layouts using a locality of reference (LOR) metricfunction to The LOR metric value for a layout is calculated by averagingthe WS metric values that would result if the layout were positioned tostart at various different locations on a page. The goal of thisaveraging is to produce a metric value that is independent of pageboundaries. Thus, in one embodiment, the LOR metric function calculatesa WS metric value for each address of a page assuming that the layout ispositioned to start at that address. The LOR metric function thenaverages those WS metric values to generate the LOR metric value for thelayout. Since a page typically contains 4,096 addresses, the LOR metricfunction would calculate 4,096 WS metric values, would sum those WSmetric values, and would divide that sum by 4,096 to generate the LORmetric value. FIG. 6 illustrates the calculation of the LOR metricvalue. The layout 601 is initially assumed to start at address 0 of apage and the WS metric value is calculated. The layout 602 is thenassumed to start at address 1 of a page and the WS metric value iscalculated. The LOR metric function calculates a WS metric value foreach address of a page. However, the calculation of the WS metric valuefor each address of a page is computationally intensive and has provedto be empirically unnecessary. Experiments have demonstrated that use ofa small number of addresses, on the order of 10, can produce nearly asaccurate an LOR metric value as does the use of every possible addressof a page. To avoid harmonic effects, the LOR metric function usesaddresses whose separations are relatively prime to each other. Table 1lists 10 prime-separated addresses with approximately even distributionthroughout a 4,096-byte page. TABLE 1 Start Address Separation 0 397 397431 828 389 1217 443 1660 383 2043 421 2464 401 2865 419 3284 379 3663433

[0061]FIG. 7 is a flow diagram of an implementation of a routine tocalculate the LOR metric value. The routine loops selecting each of thestart addresses as indicated in Table 1 and calculating the WS metricvalue assuming that the layout were to be positioned at the selectedstart address. The routine then uses the average of the WS metric valuesas the LOR metric value. In steps 701-704, the routine loops calculatinga WS metric value for each of the start addresses. In step 701, theroutine selects the next start address for the layout, starting with thefirst. In the step 702, if all the start addresses have already beenselected, then the routine continues at step 705, else the routinecontinues at step 703. In step 703, the routine positions the layout atthe selected start address. In step 704, the routine calculates the WSmetric value for the layout as positioned at the selected start address.The routine then loops to step 701 to select the next start address. Instep 705, the routine calculates the average of the WS metric values andreturns that average value as the LOR metric value for the layout.

[0062] B. Layout Number Selection Technique

[0063] The overall performance of the WS improvement system, both interms of resulting working set size and of computational speed, isaffected by the number of layouts that are generated and evaluated inphase 1. At one extreme, the WS improvement system could simply skip thelayout improvement step and incrementally improve the layout of theprogram image as generated by the linker. Alternatively, the WSimprovement system could generate only one layout in phase 1 andincrementally improve that layout. Such an approach would becomputationally fast, but may result in a working set size that is lessthan desirable. At the other extreme, the WS improvement system couldgenerate hundreds of layouts and select the best one to incrementallyimprove based on the LOR metric values of the layouts. Of course, thisapproach would be computationally expensive, but would be likely toproduce a very desirable working set size. Thus, as the number oflayouts generated increases, the chance of generating a layout with avery low LOR metric value increases. However, the expected marginalimprovement in the LOR metric value decreases. The layout numberselection technique selects the number of layouts that should begenerated by determining whether it would be more beneficial to generateand evaluate one more layout or more beneficial to use the computationalresources that would have been used to generate and evaluate thatadditional layout to further incrementally improve the layout with thebest LOR metric value without generating and evaluating an additionallayout.

[0064] To determine where it would be more beneficial (on working setsize) to expend the computational resources, the layout number selectiontechnique collects the results of many runs of the WS improvement systemand based on a statistical analysis of the results determines the likelybenefit on working set size of generating and evaluating a certainnumber of layouts and the incremental benefit of generating andevaluating one more layout. The number of layouts generated andevaluated could then be set such that the incremental benefit ofgenerating one more layout would not be worth the computational effort.This technique assumes that the results of the many runs arerepresentative of the results of the layouts to be improved. Thus, thistechnique is most useful in environments in which the program images ofthe many runs differ only slightly from the program image to beimproved. Such a similarity in program images, for example, may existbetween daily builds of program image during development of anapplication program.

[0065] The layout number selection technique also assumes that the LORmetric values of multiple layouts of a given program image are normallydistributed, that the WS metric values of the output layouts of phase 2generated from the multiple input layouts are also normally distributed,and that these two distributions are normally correlated. Theseassumptions appear to be fairly accurate to a first-order approximation.The technique evaluates the results of many runs of the WS improvementsystem on a wide variety of program images. The technique thencalculates

[0066] the standard deviation (σ) of the WS metric values of the outputlayouts of phase 2, and

[0067] the normal correlation coefficient (ρ) between LOR metric valueon the input layouts to phase 2 and WS metric value on the outputlayouts of phase 2.

[0068] The probability density function for a standard bivariate normaldistribution is${f( {x,y} )} = {\frac{1}{2\pi \sqrt{1 - \rho^{2}}}^{- \frac{({x^{2} - {2\quad \rho \quad {xy}} + y^{2}})}{2{({1 - \rho^{2}})}}}}$

[0069] The technique calculates the marginal density of the WS metricvalue of the output layout that is produced from the input layout ofphase 2 with the lowest LOR metric value. Since the problem is symmetricand since any one of the input layouts might have the lowest LOR metricvalue, the technique assumes that a selected layout has the lowest LORmetric value and then multiplies the resulting density function by thenumber of layouts (N). The technique then integrates over all values ofthe N−1 density functions' LOR metric values that are greater than theselected layout's LOR metric value, then over all values of the N−1density functions' WS metric values, and finally over all values of theto selected layout's WS metric value. The result isg(y) = N∫_(−∞)^(∞)f(x, y)(∫_(−∞)^(∞)∫_(x)^(∞)f(t, u)tu)^(N − 1)x

[0070] The mean value of this marginal density isμ = ∫_(−∞)^(∞)y  g(y)y

[0071] Although no closed-form solution exists for this quadrupleintegral, it may be evaluated numerically to any desired degree ofprecision. The product of this normalized mean with the standarddeviation of the WS metric value on the output layout yields theexpected reduction in the final WS metric value from selecting the bestof N input layouts, rather than generating only one.

reduction=μσ

[0072] Once the expected reduction has been determined, one can evaluatethe trade-off between the computational expense of generating andevaluating one more layout versus the expected improvement in theresulting working set from generating and evaluating that additionallayout. This trade-off can be evaluated against the trade-off betweenthe computational expense of additional incremental improvement stepsversus the expected improvement for performing these additional steps.For a relatively small number of incremental improvement steps, there islikely to be greater benefit to extending the number of incrementalimprovement steps during phase 2 than in generating and evaluating morelayouts during phase 1. For a relatively large number of incrementalimprovement steps, there is likely to be greater benefit to generatingand evaluating more layouts during phase 1 than in increasing the numberof incremental improvement steps during phase 2.

[0073]FIG. 8 is a flow diagram of an implementation of a routine toselect the number of layouts that should be generated and evaluated.This routine is an implementation of the layout number selectiontechnique. This routine calculates the expected marginal reduction inthe WS metric value from increasing the number of layouts generatedduring phase 1 from 1 to 2, 2 to 3, 3 to 4, and so on until the expectedmarginal reduction is not worth the computational expense of generatingand evaluating that additional layout. In step 801, the routine invokesa subroutine to generate the statistical relationships for the LORmetric values and the WS metric values collected from various runs ofthe WS improvement system. In step 802, the routine sets the number oflayouts generated during phase 1 to one. In steps 803-805, the routineloops evaluating the expected marginal reduction in the WS metric valueof the output layout resulting from generating one more layout duringphase 1. In step 803, the routine calculates the expected marginalreduction in the WS metric value of the layout output by phase 2resulting from increasing the currently selected number of layoutsgenerated during phase 1 by one. In step 804, if the marginal reductionis worth the computational expense, then the routine continues at step805, else the routine completes. In step 805, the routine increments thenumber of layouts currently selected as being generated during phase 1and loops to step 803 to calculate the expected marginal reduction. Thenumber of layouts currently selected when the routine completes is thenumber selected by the layout number selection technique.

[0074]FIG. 9 is a flow diagram of a routine to evaluate the statisticalrelationships. This routine generates a number of layouts, calculatesthe LOR metric value for each layout, incrementally improves eachlayout, and calculates the WS metric value for each incrementallyimproved layout. The routine then calculates the standard deviation (a)and normal correlation coefficient (p) as described above. In steps901-906, the routine loops generating layouts, calculating the LORmetric value for the layouts, incrementally improving the generatedlayouts, and calculating the WS metric values for the incrementallyimproved layouts. In step 901, the routine generates a layout. In step902, the routine calculates the LOR metric value for the generatedlayout. In steps 903-904, the routine loops incrementally improving thegenerated layout until a termination condition is satisfied. Thetermination condition can be either a fixed number of iterations throughthe incremental improvement or a specified time period. In step 905, theroutine calculates the WS metric value for the incrementally improvedlayout. In step 906, if enough layouts have already been generated, thenthe routine continues at step 907, else the routine loops to step 901 togenerate another layout. In step 907, the routine calculates a standarddeviation (a) of the WS metric values of the incrementally improvedlayouts. In step 908, the routine calculates normal correlationcoefficient (p) between the LOR and WS metric values and returns.

[0075] FIGS. 10A-10C are graphs illustrating the layout number selectiontechnique. FIG. 10A is a graph of the WS metric values as a function ofiterative improvement time for four generated layouts. The WS metricvalues are shown in the solid lines and the dashed line represents theaverage of the WS metric values. FIG. 10B is a graph of the average WSmetric values for various numbers of layouts. The dashed linesillustrate the marginal reduction in the WS metric value at time t as aresult of generating one more layout during phase 1. FIG. 10C is a graphof the marginal reduction in the WS metric value as a function ofiterative improvement time.

[0076] C. Termination Conditions for Incremental Improvements

[0077] The WS improvement system may use various conditions forterminating the incremental improvement process. The WS improvementsystem may determine whether a termination condition is satisfied aftereach incremental step. An incremental step corresponds to the processingof steps 301-308 of FIG. 3. The WS improvement system evaluates whethera termination condition is satisfied in step 309. In particular, the WSimprovement system may use one of the following termination conditions:

[0078] 1. fixed number of incremental steps,

[0079] 2. fixed amount of elapsed time,

[0080] 3. WS metric value of the incrementally improved layouts, or

[0081] 4. rate of improvement (ROI) of the WS metric value of theincrementally improved layouts.

[0082] One of these termination conditions or a combination of theseterminations may be used depending on the development environment andprogram image to be improved. Each of these termination conditions isdescribed below. The ROI termination condition, which has generalapplicability to many development environment and program images, isdescribed in detail.

[0083] 1. Fixed Number of Incremental Steps

[0084] The WS improvement system can terminate the incrementalimprovement process after a fixed number of incremental steps. The fixednumber that is selected for terminating the incremental improvementprocess can be determined by evaluating the results of many runs of theWS improvement system on a wide variety of data. The mean WS metricvalue after each number of incremental steps can be compared to thedesired trade-off between the working set size and computational expensewithin any statistical margin that is desired. The use of a fixed numberof incremental steps is well-suited to environments in which the programimages to be improved are similar. Such similarity may occur during thedevelopment of a program in which an executable program is built everyday that differs only slightly from day to day.

[0085] 2. Fixed Amount of Elapsed Time

[0086] The WS improvement system can also terminate the incrementalimprovement process after a specified amount of time has elapsed. Aftereach incremental step, the system can compare the current time to thestart time, and if the difference is greater than the fixed amount oftime, then the termination condition is satisfied. The use of a fixedamount of time may be particularly advantageous during development of aprogram. A production build process is likely to be allotted a fixedamount of total time, such as a few hours overnight, and some portion ofthis may be reserved for layout improvement. Thus, the WS improvementsystem improves the layout by as much as it can within the fixed amountof time and then terminates.

[0087] 3. WS Metric Value

[0088] The WS improvement system can terminate the incrementalimprovement process when the WS metric value drops below a preset value.The preset value may be determined either as an absolute value, as afunction of the initial WS metric value, as a function of a lower boundon the WS metric value, or as some combination of these. However, forany given program image, the WS metric value may never become less thanthe preset value. The incremental improvement process generally resultsin WS metric values along a curve that resembles an exponential decay.For any given starting point and sequence of improvements, there is aminimum value that is approached by the incremental improvement process.Thus, if the preset value is less than this minimum value, thetermination condition will never be satisfied. Nevertheless, such atermination condition may be useful if it is used in conjunction withone of the other termination conditions or if the preset value is knownto be achievable.

[0089] 4. Rate of Improvement (ROI) of WS Metric Value

[0090] The WS improvement system can also terminate the incrementalimprovement process when the rate of improvement of the WS metric valuedrops below a certain rate. However, it can be difficult to determinewhat actually is the rate of improvement. First, although the size ofthe improvement in the WS metric value (i.e., change in WS metric value)generally decreases as the incremental improvement process proceeds, thesize of the improvement does not decrease monotonically. That is, thechange in the WS metric value from one incremental step to the next mayincrease or decrease as the incremental improvement process proceeds.Second, the WS metric value itself does not even decrease monotonicallybecause of the interaction with the linker. That is, when the linker isperiodically invoked during the incremental improvement process todetermine a size for the basic blocks, the WS metric value of the layoutwith the newly determined sizes of the basic blocks may be larger thanthe WS metric value calculated for the previous incremental step. Toovercome these difficulties, the WS improvement system determines therate of improvement by filtering the WS metric values through a filter.The ROI termination condition is satisfied when the filtered rate ofimprovement falls below a specified rate.

[0091] The filtering technique is described in the following. The rateof improvement may be defined as the change in the WS metric value pertime interval (i.e., “Δ WS metric value/time”). The rate of improvementper time interval is related to the change in WS metric value per step(i.e., Δ WS metric value/step) by the following equation:

Δ WS metric value/time=Δ WS metric value/step÷time/step

[0092] The WS improvement system separates the rate of improvement intotwo components: the improvement in WS metric value per step and the timeper step. The WS improvement system calculates a rate of improvement perstep and then divides that calculated rate of improvement by acalculated time per step to generate the rate of improvement. Byseparating the rate of improvement into these two components, the WSimprovement system can apply separate smoothing or approximationtechniques to each component as appropriate. In the embodiment describedbelow, the WS improvement system calculates the rate of improvement perstep using a filter and calculates the time per step using a predefinedapproximation function. The WS improvement system then combines thesevalues to calculate the rate of improvement per time interval. FIG. 11is a block diagram illustrating the steps for separately calculating therate of improvement per step and the time per step. In steps 1101-1104the WS improvement system calculates of the rate of improvement perstep. The WS improvement system inputs a layout and calculates the WSmetric value for the layout. The WS improvement system then calculatesthe running minimum of the WS metric value. The running minimumrepresents a value that decreases monotonically. The WS improvementsystem then calculates the instantaneous rate of improvement based onthe current running minimum. The WS improvement system then filters theinstantaneous rate of improvement. In step 1105, the WS improvementsystem inputs the number of the incremental step that produced thelayout and calculates the time for that step. Finally, in step 1106, theWS improvement system combines the filtered rate of improvement per stepand the calculated time per step to generate the rate of improvement.

[0093] A review of the graphs of the various measurements relating tothe rate of improvement helps to illustrate the need for filtering. FIG.12A is a graph of the WS metric value versus time for the incrementalimprovement process of a sample layout. For example, at time 200 thecorresponding WS metric value is approximately 11.8. As the incrementalimprovement process proceeds, the WS metric value of the incrementallyimproved layout decreases. However, the improvement from one timeinterval to the next does not decrease monotonically. For example, asmall improvement occurs during time interval 340-345 and a largeimprovement occurs during the time interval 345-350. FIG. 12B is a graphof the improvement in the WS metric value for each time interval duringthe incremental improvement process. This graph is generated by takingthe difference between WS metric values in successive time intervals. Ascan be seen by this graph, the improvement per time interval is highlyerratic and not monotonic. FIG. 12C is a graph of the WS metric valueversus step number. The rate of improvement generally decreases in eachstep, but does not decrease monotonically. FIG. 12D is a graph of theimprovement in the WS metric value for each step. This graph isgenerated by taking the difference between the WS metric values betweensuccessive steps. Although the graph is somewhat erratic, the generaltrend is a lower rate of improvement as the number of steps increase.FIG. 12E is a graph of the processing time per step. The processing timeis fairly high for the first four steps and then drops the next sixsteps and then drops further and continues at an erratic level butgenerally tends to increase towards steps 70 and 80.

[0094] a) Calculating the Processing Time Per Incremental Step

[0095] The processing time per incremental step varies substantiallyover the course of the incremental improvement process as shown in FIG.12E. The WS improvement system in one embodiment, rather than filteringthe actual time per step to effect smoothing, estimates the expectedtime per step as a function of several control parameters that tend todescribe the amount of processing during each step. The controlparameters can be selected according to the particular incrementalimprovement algorithm used. In the following, the control parameters forthe incremental improvement algorithm that uses the slinky algorithm ofFIGS. 3-5 are described. FIG. 3 illustrates the overall incrementalimprovement process. FIG. 4 is a flow diagram of an implementation of aroutine to select an initial anchor basic block for the slinkyalgorithm. FIG. 5 is a flow diagram of an implementation to find thebasic block with the lowest metric value. The basic block with thelowest metric value is that basic block such that when the basic blocksbetween the anchor basic block and that basic block are rearranged, theresulting metric value of the layout is the lowest. The controlparameters for this algorithm are the:

[0096] number of times the slinky algorithm (NR) is repeated for eachincremental step. This number corresponds to the number of times thesequence of steps 303-306 are performed for each incremental step. Thisnumber can vary from each incremental step to the next. In the routineillustrated in FIG. 3, the slinky algorithm is repeated only once foreach incremental step. If the slinky algorithm were to be repeatedmultiple times, then a step before step 307 would determine whether theslinky algorithm had been repeated for the designated number of timesfor that incremental step. If not, the routine would loop to step 303,else the routine would continue at step 307.

[0097] number of sets of basic blocks (NX) identified when searching foran initial anchor basic block. This number corresponds to the number ofsets of basic blocks identified in step 401 and to the number of timesthat step 405 loops to step 402 in FIG. 4.

[0098] number of basic blocks (NY) in each identified set of basicblocks. This number corresponds to the number in each set identified instep 401 and to the number of times that steps 407 and 409 loop to step404 in FIG. 4 for each set.

[0099] number of slinky sub-steps (NS) per incremental step. This numbercorresponds to the number of ranges of basic blocks evaluated during asearch of the slinky algorithm and corresponds to the number of timesthat step 305 in FIG. 3 loops through step 306 to step 304.

[0100] maximum search distance (MD) of a slinky sub-step. This distancecorresponds to the number of basic blocks evaluated when identifying arange of basic blocks and corresponds to the number of times that step502 passes control to step 503 in FIG. 5.

[0101] number of basic blocks per page in the program image (BP).

[0102] various constant terms that can be measured from runs of theincremental improvement system (Cx).

[0103] Several of the control parameters contain random components. Forexample, the number of basic blocks identified (NX) and the number ofslinky sub-steps (NS) have a random component. Thus, their expected(mean) values are used.

[0104] The amount of processing time required by an incremental step isapproximately equal to the number of alternate layouts evaluatedmultiplied by the time required to perform one evaluation. The alternatelayouts are generated and evaluated by the designate initial anchorbasic block routine of FIG. 4. The number of evaluations is

NX·NY

[0105] The slinky algorithm of FIG. 3 requires the following number ofevaluations per step:

NS·MAD

[0106] Since the slinky algorithm can be repeated multiple times for asingle incremental step, the total number of evaluations is equal to:

NR·(NX·NY+NS·MD)

[0107] The evaluation of each alternate layout requires some constantamount of time (C1), plus an additional amount (C2) that is proportionalto the number of pages evaluated, plus some amount (C3) for each blockwhose usage vector must be logically-ORed to compute the page usagevectors. The number of pages evaluated is determined by the maximumsearch distance (expressed in basic blocks) and the number of blocks perpage. Thus, a single layout evaluation requires the following amount oftime:

C1+C2·(MD/BP)+C3·MD

[0108] Thus, the following formula expresses the amount of time requiredfor each step:

NR·(NX·NY+NS·MD)·(C1+C2·(MD/BP)+C3·MD)

[0109] Using this formula, the expected time per step as the incrementalimprovement process proceeds can be estimated. Since only mean values ofthe control parameters with random components are used in the formula,short-term variations in the time per step due to randomness areeffectively eliminated.

[0110] The effect of various values of these control parameters on theactual time per step can be seen in FIG. 12E. The incrementalimprovement process repeated the slinky algorithm two times (ie., NR=2)during each of the first four incremental steps and only once perincremental step thereafter. Thus, the time per step dropped from around15 to around 6 from step 4 to step 5. The incremental improvementprocess identified a certain number (NX) of sets of basic blocks whenidentifying an initial anchor basic block for each slinky algorithmsearch during the first 10 steps and used a lower number for theremainder of the incremental improvement process. The effect of usingthis lower number is seen by the drop in time per step from around 6 instep 10 to around 2 from step 11 onward. Also, the maximum searchdistance (MD) gradually decreases and the number of basic blocks (NY)per identified set of basic blocks gradually increases throughout theincremental improvement process. This decrease and increase result in anoverall slow decrease in the time per step for steps 11-31 followed byan overall slow increase in the time per step.

[0111] b) Filtering the Δ WS Metric Value/Step

[0112] (1) Background on Filters

[0113] Filtering techniques for a stream of input values generallycalculate a weighted average of several sequential input values. Thegoal of the filtering is to smooth out any large variations in the inputvalues so that overall trends of the input values can be more easilyidentified from the filtered values. A filtering technique is generallydescribed in terms of an equation that specifies the weighted averagecalculation. The following equation is an example of such an equation:

y _(i) =A ₀ x _(i) +A ₁ x _(i−1)

[0114] where y_(i) represents the i^(th) filtered value, where x_(i)represents the i^(th) input value, and A_(N) represents the weights tobe applied to the (i−N)^(th) input value. In this example equation, ifA₀+A₁=1, then the filtered value is the weighted average of the currentinput value and the previous input value. Because the equation combinestwo input values, it is referred to as a second order filter. Filterswhose filtered values are based solely on a fixed number of previousinput values (i.e., the order) are referred to as finite impulseresponse (FIR) filters or moving average (MA) filters. Certain filtersgenerate filtered values that are based on a history of all the previousinput values and are referred to as infinite impulse response (IIR)filters or autoregressive (AR) filters. The following equation is anexample equation of an IIR filter:

y _(i) =A ₀ x _(i) +B ₁ y _(i−1)

[0115] where y_(i) represents the filtered value, where x_(i) representsthe i^(th) input value, where A_(N) represents the weight to apply tothe i^(th) input value, and where B₁ represents the weight to apply tothe y_(i−1) filtered value. Because each filtered value of an IIR filteris based on one or more previous filtered values and input values, eachfiltered value is based on every previous input value. In other words,the first input value has an influence, albeit increasingly small, onevery filtered value no matter how many are generated. Indeed, theinfluence decays exponentially.

[0116] (2) The Rate of Improvement Per Step

[0117] The goal of filtering the ΔWS metric values is to produce astream of filtered ΔWS metric values that reflect the overall rate ofimprovement in the working set as a result of each incremental step.Given a stream of WS metric values, the rate of improvement at each stepis defined as the maximum rate such that if the improvements arecontinued at that maximum rate, then a WS metric value that is actuallypresent in the stream would result. FIG. 13 illustrates the defined rateof improvement for a stream of WS metric values. The dashed linerepresents the actual WS metric values and the solid line represents theWS metric values that would result if the defined rate of improvementmatched the actual rate of improvement at each incremental step. Thesolid line is referred to mathematically as the convex hull of the WSmetric function, because it is the largest-valued convex curve that liesentirely below the WS metric function. The slope of the convex hull isthe defined rate of improvement. (Strictly speaking, the slope is anegative quantity, because the value of the metric function isdecreasing over time. So, the use herein of the term “slope” refers tothe absolute value of the slope.) FIG. 14 illustrates the defined rateof improvement and instantaneous rate of improvement. The instantaneousrate of improvement is shown in the dashed line, and the defined rate ofimprovement is shown in the solid line. The defined rate of improvementhas the desirable property that eventually the average rate ofimprovement over a number of incremental steps will equal that definedrate. Thus, the defined rate of improvement is used to decide when toterminate the incremental improvement process. The instantaneous rate ofimprovement has the undesirable property that the improvement at onestep can be zero or negative, but be very large at the next step. Thus,if the instantaneous rate of improvement were used to terminate theincremental improvement, then termination might occur just before anincremental step that produces a significant improvement.

[0118]FIG. 15 is a flow diagram of a routine to generate the definedrate of improvement for a stream of known WS metric values. As describedbelow in detail, this routine is used when analyzing the WS metricvalues of various runs of the WS improvement system to generatecoefficients for the filter. The defined rate of improvement, asdescribed above, is (the absolute value of) the slope of the convexhull. The routine generates the defined rate of improvement byconceptually selecting a starting point on the graph of the WS metricvalues and searching to the right (i.e., higher incremental step number)for another point on the graph which, when connected to the selectedpoint, would have the maximum slope of all such points to the right. Theroutine connects those points and repeats the process by selecting theother point and again searching to the right for another point with themaximum slope. The known WS metric values are stored in an array named“value,” which is passed to this routine. In step 1501, the routine setsthe variable startindex to zero. The variable startindex is used toindicate the index of the point in the array value for which thecorresponding point with the maximum slope is to be determined. In step1502, the routine sets the variables maxslope and maxindex to zero andsets the variable endindex to the value of variable startindex plus one.The variables maxindex and maxslope are used to track the point with themaximum slope when searching. In steps 1503-1507, the routine loopssearching towards the end of the graph (i.e., to the right) for thepoint which results in the maximum slope from the point indexed by thevariable startindex. In step 1503, the routine sets the variable slopeequal to the array value indexed by the variable startindex minus thearray value indexed by the variable endindex divided by the number ofsteps between the indexes. In step 1504, if the variable slope is lessthan the variable maxslope, then a point with a larger slope has alreadybeen found, then the routine continues at step 1506 to continuesearching, else the routine continues at step 1505. In step 1505, theroutine sets the variable maxslope equal to the variable slope and thevariable maxindex equal to the variable endindex. In step 1506, theroutine increments the variable endindex. In step 1507, if the variableendindex is less than the variable numvalues (ie., number of metricvalues), then the routine loops to step 1503 to check the slope for thenext point in the graph, else the routine continues at step 1508. Insteps 1508-1510, the routine sets the value of the defined rate ofimprovement for the points between the variable startindex and themaxindex to the value of the variable maxslope. In step 1508, theroutine sets the variable loopindex equal to the variable startindex. Instep 1509, the routine sets the array hull indexed by the variableloopindex equal to maxslope and increments the variable loopindex. Instep 1510, if the variable loopindex is less than the variable maxindex,then the routine loops to step 1509, else the routine continues at step1511. In step 1511, the routine sets the variable startindex equal tothe variable maxindex to continue searching from the point indexed bythe variable maxindex. In step 1512, if the variable startindex is lessthan the variable numvalues minus one, then the routine loops to step1502 to continue determining the defined rate of return for the pointspast the variable maxindex, else the routine is done.

[0119] During the incremental improvement process, the defined rate ofimprovement for the current incremental step can, of course, not bedetermined because the WS metric values for subsequent steps are not yetknown. Thus, the goal of the rate of improvement (ROI) terminationcondition is to estimate accurately the defined rate of improvement ofthe current incremental step so that additional incremental steps can beavoided if the defined rate of improvement indicates that they would notbe worth the computational expense.

[0120] The techniques described below generate coefficients for thefilter for the instantaneous rate of improvement of the WS metricvalues. As a first step, a running minimum of the WS metric values ismaintained. This running minimum effects a filtering of artifacts in theWS metric value resulting from invocations of the linker. In addition,the running minimum monotonically decreases, which is a desirableattribute for subsequent filtering. The coefficient generationtechniques analyze data (e.g., WS metric values) for a large number ofruns of the WS improvement system when generating the coefficients.

[0121] (3) Generating Coefficients Using Frequency-Domain Analysis

[0122] The frequency-domain analysis technique computes a power spectrumfor the instantaneous rate of improvement and a power spectrum for thedefined rate of improvement for various runs of the WS improvementsystem. The power spectra are obtained by computing a discrete Fouriertransform of the time series data for the rate of improvement. FIG. 16illustrates the power spectra. The dashed line represents the powerspectrum for the instantaneous rate of improvement, and the solid linerepresents the power spectrum for the defined rate of improvement. Thehorizontal axis is normalized to radian frequencies, and the verticalaxis is in decibels. The technique calculates the difference between thetwo spectral curves and offsets the difference so that the value is zeroat a frequency of zero. The dashed line in FIG. 17 illustrates theoffset differences in the power spectra. The technique then fits thefrequency response curve of a filter to the offset difference. In FIG.17, the solid line represents the frequency response of a first-orderIIR filter that minimizes the mean squared error with respect to theoffset differences. Alternatively, a higher-order IIR filter or a FIRfilter could be used. Also, a different type of curve-fitting functionother than mean squared error could be used. Since the frequencyresponse varies non-linearly with the filter coefficients, an iterativetechnique, such as the Levenberg-Marquardt algorithm is used. TheLevenberg-Marquardt algorithm is described in Press, W. et al.,“Numerical Recipes in C: The Art of Scientific Computing,” 2nd ed.,Cambridge University Press, 1992, pp. 683-88.

[0123]FIG. 18 is a flow diagram of a routine to generate the filtercoefficients using the frequency-domain analysis. In step 1801, theroutine collects data from various runs of the WS improvement system.These runs can use a termination condition based on a fixed-number ofincremental steps or a fixed-time period. In this step, the routine alsocomputes the defined rate of improvement according to the steps of FIG.15. In step 1802, the routine computes the power spectrum of the runningminimum of the instantaneous rate of improvement in the collected WSmetric values using a discrete Fourier transform. In step 1803, theroutine computes the power spectrum of the actual is defined rate ofimprovement. In step 1804, the routine computes the difference offsetbetween the power spectra. In step 1805, the routine uses theLevenberg-Marquardt algorithm to determine the coefficients for thefilter.

[0124] (4) Generating Coefficients Using Time-Domain Analysis

[0125] The time-domain analysis technique generates coefficients for aFIR filter based on the instantaneous rate of improvement of the runningminimum of the WS metric values and the actual defined rate ofimprovement of various runs of the WS improvement system. The techniquefirst generates coefficients for a first-order FIR filter and then asecond-order FIR filter. If the improvement between the first-order andsecond-order FIR filters is significant, then the technique repeats thisprocess for successively higher-order FIR filters until the improvementis no longer significant. The coefficients for the highest-order FIRfilter that showed a significant improvement are to be used in thefiltering. Alternatively, the WS improvement system can determinewhether the improvement in the next higher-order FIR filter would besignificant without even generating the coefficients for that nexthigher-order FIR filter. The WS improvement system can calculate theerror between the estimated rate of improvement using the first-orderFIR filter and the actual defined rate of improvement. If thecorrelation between that error and the additional WS metric value thatwould be added with the next higher-order FIR filter is significant,then the next higher-order FIR filter is generated and the processcontinues, else the current-order FIR filter is used.

[0126]FIG. 19 illustrates the instantaneous rate of improvement and theactual defined rate of improvement for a sample run. The circlesindicate the differences in running minimum of the WS metric values foreach incremental step, and the squares indicate the differences in theWS metric values that would produce the actual defined rate ofimprovement. The circles thus represent input values, and the squaresrepresent target values.

[0127] The technique initially derives a first-order, linear expressionfor a function that relates each input value to the corresponding targetvalue. The function is thus of the form:

T_(n)=A₀I_(n)

[0128] Each target value, T_(n), is the product of a constantcoefficient, A₀, and the current input value, I_(n). For example, theinput value I₁₂ in FIG. 19 is 3.6, and the target value T₁₂ is 3.8,meaning that the ideal value of A₀ is 3.8/3.6=1.06. However, the inputvalue I₁₃ is 4.0, and the target value T₁₃ is 3.8, meaning that theideal value of A₀ is 3.8/4.0=0.95. Since A₀ cannot equal both of thesevalues, the technique chooses as a compromise the value for A₀ thatminimizes the mean squared error over all target values. Since this fitis linear, the value for the coefficient can be determined throughstandard linear regression techniques, an important consideration sincethe volume of data is likely to be large. For example, if the meansquared error is minimized by a value of 0.98 for A₀, then the valuesand associated residual errors are indicated in Table 2. TABLE 2 n I_(n)T_(n) A₀I_(n) E_(n)(error) 10 4.1 3.9 4.0 0.1 11 3.7 3.9 3.6 −0.3 12 3.63.8 3.5 −0.3 13 4.0 3.8 3.9 0.1 14 3.9 3.8 3.8 0.0 15 3.4 3.6 3.3 −0.316 3.9 3.6 3.8 0.2 17 4.1 3.6 4.0 0.4 18 3.4 3.5 3.3 −0.2 19 3.7 3.5 3.60.1

[0129] Such a first-order FIR filter is unlikely to provide a very goodestimate of the target values, so the technique determines whether thefiltering can be improved by using a higher-order FIR filter. Forexample, the previous estimate of T₁₃ was based only on I₁₃. An estimatewith a second-order FIR filter is based on both I₁₃ and I₁₂. Similarly,the previous estimate of T₁₂ was based only on I₁₂. An estimate with asecond-order FIR filter is based on both I₁₂ and I₁₁. The second-order,linear expression of the form:

T ₀ =A ₀ I _(n) +A ₁ I _(n−1)

[0130] The technique then determines whether there is a significantreduction in the residual errors from adding the additional linear termto the FIR filter. The technique can determine the likely benefit fromthe additional term without actually performing the derivation of thenew expression. The technique does so by examining the statisticalcorrelation between each error term (E_(n)) and the input value thatleads each error term by one step (I_(n−1)) This analysis can employ anyeffective correlation metric, such as the normal correlation coefficientor the rank correlation coefficient. If there is a significantstatistical correlation, then there is benefit to deriving the morehigher-ordered expression. The technique repeats this process forthird-order, linear expressions, and then fourth-order, and so on, untilthere is no statistically significant improvement from increasing the tolinear order of the expression.

[0131] The technique generates a set of coefficients (A₀, A₁, . . . ,A_(N)) have been derived for an N^(th)-order linear expression that isequivalent to an N^(th)-order FIR filter. As the order increases, so toodoes the initial latency before which no estimate of the rate ofimprovement is available. The technique can set a cap on is the maximumvalue of N in order to limit this latency.

[0132]FIG. 20 is a flow diagram of a routine to generate thecoefficients of the filter using time-domain analysis. In step 2001, theroutine collects the WS metric values for various runs of the WSimprovement system. The routine also calculates the corresponding targetWS metric value derived from the actual defined rate of improvement. Instep 2002, the routine initializes the order of the FIR filter to one.In steps 2003-2006, the routine loops generating coefficients forsuccessively higher-order FIR filters until the correlation between eacherror term (E_(n)) in the current-order (N) FIR filter and each N^(th)previous input value (I_(n−N)) is not significant. In step 2003, theroutine derives the coefficients for the current order and the errorterms. In step 2004, the routine calculates the correlation between eacherror term and each N^(th) previous WS metric value. In step 2005, ifthe correlation is significant, then the routine increments the order instep 2006 and loops to step 2003 to process the next higher order, elsethe routine is done.

[0133]FIG. 21 is a flow diagram of a routine to collect samples forgenerating coefficients for a filter. In steps 2101-2110, the routineloops selecting an initial layout, incrementally improving the layout,and calculating the instantaneous rate of improvement of the runningminimum and the actual defined rate of improvement based on the WSmetric value of the incremental steps. In step 2101, the routine selectsan initial layout. In steps 2102-2106, the routine incrementallyimproves the layout until a termination condition (e.g., fixed-number ofsteps) is satisfied. In step 2102, the routine incrementally improvesthe layout. In step 2103, the routine calculates the WS metric value forthe incrementally-improved layout. In step 2104, the routine calculatesthe running minimum of the WS metric values. In step 2105, the routinedetermines whether the termination condition is satisfied. In step 2106,if satisfied, then the routine continues at step 2107, else the routineloops to step 2102 to continue incrementally improving the layout. Instep 2107, the routine calculates the instantaneous rate of improvementof the running minimum. In step 2109, the routine calculates the actualdefined rate of improvement. In step 2110, if enough samples have beencollected, then the routine is done, else the routine loops to step 2101to select the next initial layout.

[0134] (5) Enhancing the FIR Filter

[0135] The technique can improve upon the FIR filter with the generatedcoefficients by converting it into an IIR filter. The technique adds oneor more autoregressive (AR) coefficients (i.e., poles) to the filter.The technique adds the AR coefficients to obtain an optimal tradeoffbetween confidence and mean lag in the filter. Confidence refers to thedegree of certainty that the rate of improvement is not underestimated.In other words, it is the likelihood that the incremental improvementswill not terminate prematurely. Mean lag refers to the mean number ofincremental steps that elapse between the ideal number at which toterminate and the actual number at which the incremental process isterminated. It is desirable to have a very high confidence and a verysmall mean lag. However, as the confidence level increases the mean lagalso increases. Conversely, as the mean lag decreases the confidencelevel also decreases. The optimal tradeoff between confidence and meanlag will vary based on the environment in which the WS improvementsystem is used. However, a function that inputs the confidence and meanlag and outputs a scalar value that rates the inputs based on a tradeoffstrategy can be defined for each environment.

[0136] The technique employs an iterative, nonlinear minimizationapproach that varies the values of one or more AR coefficients over arange of stable values until the minimum value of the rating function isachieved. Brent's Method or Powell's Method (for multiple ARcoefficients) can be used to minimize the value of the rating function.(See “Numerical Recipes in C,” at 402-20.)

[0137]FIG. 22 is a flow diagram of a routine that evaluates a set of ARcoefficients. By repeatedly invoking this routine for various sets of ARcoefficients, an optimal set can be identified. This routine calculatesthe mean lag and confidence based on processing sample runs of theincremental process using the set of AR coefficients. In steps 2201, theroutine sets the total lag of all the sample runs to zero and the totalcount of samples in which the incremental processing is terminated afterthe ideal incremental step for terminating. In steps 2202-2209, theroutine loops selecting various samples and calculating variouslag-based statistics. In step 2202, the routine selects the next sample.In step 2203, the routine computes the defined rate of improvement andthe filtered rate of improvement (i.e., using the AR coefficients) basedon the actual WS metric values for each step. In step 2204, the routinecalculates the ideal termination step based on the defined rate ofimprovement and the actual termination step based on the filtered rateof improvement. In step 2205, the routine calculates the lag. In step2206, the routine adjusts a running total of the lag. In step 2207, ifthe lag is negative, then the incremental process would have terminatedtoo early s if the filter with the AR coefficients had been used and theroutine continues at step 2209, else the routine continues at step 2208.In step 2208, the routine increments the total number of samples inwhich the termination was not premature. In step 2209, if all thesamples have already been selected, then the routine continues at step2210, else the routine loops to step 2202 to process the next sample. Instep 2210, the routine calculates the mean lag as the total lag dividedby the number of sample runs and the confidence as the percentage ofsample runs in which the termination was not premature. In step 2211,the routine computes a scalar value that rates the desired tradeoffbetween mean lag and confidence. This scalar value is then used toselect the next set of AR coefficients.

[0138] From the foregoing it will be appreciated that, although specificembodiments of the invention have been described herein for purposes ofillustration, various modifications may be made without deviating fromthe spirit and scope of the invention. Accordingly, the invention is notlimited except as by the appended claims.

1. A method for evaluating the locality of references for a layout of acomputer program, the computer program to be executed by a computersystem with a page architecture, each page having a multiplicity ofmemory locations, the method comprising: for each of a plurality ofmemory locations of a page, calculating a metric value indicating aworking set size of the layout when the layout is positioned to start atthat memory location; and combining the calculated metric values as anindication of the locality of references of the layout of the computerprogram.
 2. The method of claim 1 wherein the plurality of memorylocations include each memory location of a page.
 3. The method of claim1 wherein the plurality of memory locations include less than all thememory locations on a page.
 4. The method of claim 1 wherein theplurality of memory locations are separated by relatively prime numberof memory locations.
 5. The method of claim 1 wherein the plurality ofmemory locations includes less than all the memory locations on a pageand are approximately evenly distributed throughout the page.
 6. Themethod of claim 1 wherein the plurality of memory locations includesless than all the memory locations on a page, are approximately evenlydistributed throughout the page, and are separated by relatively primenumber of memory locations.
 7. The method of claim 1 wherein thecombining of the calculated metric values includes averaging thecalculated metric values.
 8. A method in a computer system fordetermining a number of different layouts of a program image that shouldbe evaluated when selecting a layout as a starting layout for a workingset optimization process, the method comprising: collecting data onresults of using the working set optimization process to improve theworking set of various layouts of various program images; anddetermining from the collected number of layouts whose computationalexpense in generating and evaluating the layouts is worth an anticipatedreduction in the working set when the program image is optimized by theworking set optimization process.
 9. The method of claim 8 wherein thecollecting of data includes calculating a marginal density of theworking set size of the optimized layout that is produced from one ofthe various layouts with the smallest locality of reference.
 10. Themethod of claim 8 wherein the collecting of data includes calculating amean marginal density that is μ = ∫_(−∞)^(∞)y  g(y)y

whereg(y) = N∫_(−∞)^(∞)f(x, y)(∫_(−∞)^(∞)∫_(x)^(∞)f(t, u)tu)^(N − 1)x

where N is the number of layouts and ƒ is a joint probability densityfunction that relates a metric for each unoptimized layout to a metricfor each corresponding optimized layout.
 11. The method of claim 10wherein the joint probability density function is${f( {x,y} )} = {\frac{1}{2\pi \sqrt{1 - \rho^{2}}}^{- \frac{({x^{2} - {2\quad \rho \quad {xy}} + y^{2}})}{2{({1 - \rho^{2}})}}}}$

and where ρ is a normal correlation coefficient between a locality ofreference metric for the unoptimized layouts and a working set sizemetric for the optimized layouts.
 12. The method of claim 10 wherein theanticipated reduction is μσ where σ is a standard deviation of a workingset size metric for the optimized layouts.
 13. The method of claim 10wherein the probability density function is a standard bivariate normaldistribution.
 14. The method of claim 8 wherein the determining of thenumber of layouts includes determining the computational expense ofgenerating and evaluating successively larger number of layouts untilthe computation expense is not worthwhile.
 15. A method for estimating arate of change per time period in results of a system, the systemproducing a result for each of a plurality of steps performed by thesystem, the steps occurring at irregular time intervals, the methodcomprising: estimating the change in results at each of the plurality ofsteps; estimating the time for each step; and combining the estimatedchange in results per step with the estimated time per step to estimatethe rate of change per time period in the results of the system.
 16. Themethod of claim 15 wherein the estimating of the change in resultsincludes filtering the actual change in results from one step to thenext step.
 17. The method of claim 15 wherein the estimating of the timefor each step includes evaluating a formula that estimates the time perstep based on estimated numbers of times that various sub-steps areperformed per step.
 18. The method of claim 17 wherein the estimatednumbers of times that various sub-steps are performed are based onevaluating actual results of the system.
 19. The method of claim 15wherein the estimating of the change in results includes filtering theactual change in results from one step to the next step and wherein theestimating of the time for each step includes evaluating a formula thatestimates the time per step based on number of time various sub-stepsare performed.
 20. The method of claim 15 wherein the system is acomputer program layout optimization system.
 21. The method of claim 15including terminating the system when the estimated rate of change pertime period is outside of a threshold rate of change.
 22. The method ofclaim 15 wherein the estimating of the change in the results includesfiltering the actual change in results from one step to the next with afilter that is generated from analysis of results produced by thesystem.
 23. The method of claim 22 wherein the filter is generated byusing a frequency-domain analysis of an actual rate of change per stepand defined rate of change per step, the defined rate of change beingcalculated for each step based on knowledge of results of subsequentsteps.
 24. The method of claim 22 wherein the filter is generated byusing a time-domain analysis of an actual rate of change per step anddefined rate of change per step, the defined rate of change beingcalculated for each step based on knowledge of results of subsequentsteps.
 25. A method in a computer system for estimating the rate ofimprovement in the working set for a plurality of incrementally improvedlayouts of a computer program, the method comprising: estimating thechange in working set size from one incrementally improved layout to thenext incrementally improved layout; estimating the time needed toincrementally improve the layout; and combining the estimated change inworking set size with the estimated time needed to incrementally improvethe working set for that layout to estimate the rate of improvement. 26.The method of claim 25 wherein the estimating of the change in workingset size includes filtering the actual change in working set size fromone incremental improvement to the next.
 27. The method of claim 25wherein the estimating of the time to incrementally improve the layoutincludes evaluating a formula that estimates the time based on estimatednumbers of times that various sub-steps are performed when incrementallyimproving the layout.
 28. The method of claim 27 wherein the estimatednumbers of times that various sub-steps are performed are based onevaluating actual results of the incremental improvement.
 29. The methodof claim 25 wherein the estimating of the change includes filtering theactual change in working set size from one incremental improvement tothe next and wherein the estimating of the time to incrementally improvethe layout includes evaluating a formula that estimates the time basedon number of times various sub-steps are performed when incrementallyimproving the layout.
 30. The method of claim 25 wherein the estimatingof the change in working set size filters an actual change in workingset size and the estimating of time computes the time based on number oftimes the layout has been incrementally improved.
 31. The method ofclaim 25 including terminating the system when the estimated rate ofchange per time period is outside of a threshold rate of change.
 32. Themethod of claim 25 wherein the estimating of the change in the workingset size includes filtering the actual change in working set size fromone incremental improvement to the next with a filter that is generatedfrom analysis of working set size produced during the incrementalimprovement of other layouts.
 33. The method of claim 32 wherein thefilter is generated by using a frequency-domain analysis of an actualrate of change per incremental improvement and defined rate of changeper incremental improvement, the defined rate of change being calculatedfor each incremental improvement based on knowledge of working set sizeof subsequent incremental improvements.
 34. The method of claim 32wherein the filter is generated by using a time-domain analysis of anactual rate of change per incremental improvement and defined rate ofchange per incremental improvement, the defined rate of change beingcalculated for each incremental improvement based on knowledge ofworking set size of subsequent incremental improvements.
 35. A method ina computer system for estimating a rate of change of results of afunction over an indeterminate interval, the method comprising:receiving a plurality of results of the function; calculating a negativeslope of a convex hull of the results of the function; and using thecalculated negative slope as an estimate of the rate of change ofresults of the function.
 36. The method of claim 35 wherein the resultsof the function are generated by calculating a working set size of aprogram image as the layout is incrementally improved.
 37. The method ofclaim 35 including calculating a running minimum of the results of thefunction wherein the calculating of the negative slope of a convex hulldoes so for the calculated running minimum.
 38. The method of claim 37wherein the results of the function are generated by calculating aworking set size of a program image as the layout is incrementallyimproved.
 39. A method in a computer system for identifying coefficientsfor a filter, the filter for filtering results of a function, the methodcomprising: collecting sample input values to the filter; identifyingdesired output values from the filter for the collected sample inputvalues; generating a power spectrum of the collected sample inputvalues; generating a power spectrum of the identified desired outputvalues; calculating a difference between the generated power spectra;and identifying coefficients that minimize differences between thecalculated differences.
 40. The method of claim 39 wherein thecalculated difference is offset so that the calculated difference at afrequency of zero is zero.
 41. The method of claim 39 wherein theresults of the function are generated by calculating a working set sizeof a program image as the working set size is incrementally improved.42. The method of claim 39 wherein a Levenberg-Marquardt algorithm isused to identify the coefficients.
 43. A method in a computer system foridentifying coefficients for a finite impulse response filter, themethod comprising: collecting sample input values for a function;identifying desired output values for the filter for the collectedsample input values; approximating the output values from the inputvalues using a linear fitting technique; and setting the coefficients tovalues obtained from the linear-fitting technique.
 44. The method ofclaim 43 wherein the results of the function are generated bycalculating a working set size of a program image as the working setsize is incrementally improved.
 45. The method of claim 43 wherein thelinear-fitting technique is for a certain-order finite impulse responsefilter and including determining an error between the approximatedoutput values and the desired output values and when the differencebetween the determined error and input for the next higher-order finiteimpulse response filter is not significant using the certain-orderfinite impulse response filter.
 46. The method of claim 45 wherein thesignificance of the difference between the determined error and inputfor the next higher-order finite impulse response filter is determinedby correlating the determined error with the additional input for thenext higher-order finite impulse response filter.
 47. A method forconverting a finite impulse response filter into an infinite impulseresponse filter so that a desired tradeoff between confidence and meanlag is established, the method comprising: for each of a plurality ofsets of autoregressive coefficients, calculating results of the infiniteimpulse response filter with the set of autoregressive coefficients, andgenerating a value that rates the confidence and mean lag for thecalculated results; and selecting the set of autoregressive coefficientswith the generated value with the highest rating of confidence and meanlag for the infinite impulse response filter.
 48. The method of claim 47wherein the filters are for determining a rate of improvement in aworking set optimization process.
 49. The method of claim 47 wherein theplurality of sets are generated using Brent's method.
 50. The method ofclaim 47 wherein the plurality of sets are generated using Powell'smethod.
 51. A computer-readable medium containing instructions forcausing a computer system to evaluate locality of references for alayout of a computer program, the computer program to be executed by acomputer system with a page architecture, each page having memorylocations, by: for each of a plurality of selected memory locations of apage, estimating a working set size of the layout when the layout ispositioned to start at that memory location; and combining the estimatedworking set size as an indication of the locality of references of thelayout of the computer program.
 52. The computer-readable medium ofclaim 51 wherein the plurality of selected memory locations include eachmemory location of a page.
 53. The computer-readable medium of claim 51wherein the plurality of selected memory locations include less than allthe memory locations on a page.
 54. The computer-readable medium ofclaim 51 wherein the plurality of selected memory locations areseparated by relatively prime number of memory locations.
 55. Thecomputer-readable medium of claim 51 wherein the plurality of selectedmemory locations includes less than all the memory locations on a pageand are approximately evenly distributed throughout the page.
 56. Thecomputer-readable medium of claim 51 wherein the plurality of selectedmemory locations includes less than all the memory locations on a page,are approximately evenly distributed throughout the page, and areseparated by relatively prime number of memory locations.
 57. Thecomputer-readable medium of claim 51 wherein the combining of theestimated working set size includes averaging the estimated working setsizes.
 58. A computer system for estimating the rate of improvement, inthe working set size for a plurality of layouts of a computer program,the layouts resulting from a layout optimization process, comprising: afirst estimating component that estimates the change in working set sizefrom one improved layout to the next improved layout; a secondestimating component that estimates the time needed to improve thelayout; and a combining component that combines the estimated change inworking set size with the estimated time needed to improve the workingset for that layout to estimate the rate of improvement.
 59. The systemof claim 58 wherein the estimating of the change in working set sizeincludes filtering the actual change in working set size from one layoutto the next.
 60. The system of claim 58 wherein the estimating of thetime to improve the layout includes evaluating a formula that estimatesthe time based on estimated numbers of times that various sub-steps areperformed when improving the layout.
 61. The system of claim 60 whereinthe estimated numbers of times that various sub-steps are performed arebased on evaluating actual results of the improvement.
 62. The system ofclaim 58 wherein the estimating of the change includes filtering theactual change in working set size from one layout to the next andwherein the estimating of the time to improve the layout includesevaluating a formula that estimates the time based on number of timesvarious sub-steps are performed when improving the layout.
 63. Thesystem of claim 58 wherein the estimating of the change in working setsize filters an actual change in working set size and the estimating oftime computes the time based on number of times the layout has beenimproved.
 64. The system of claim 58 including terminating the systemwhen the estimated rate of change per time period is outside of athreshold rate of change.
 65. The system of claim 58 wherein theestimating of the change in the working set size includes filtering theactual change in working set size from one layout to the next with afilter that is generated from analysis of working set size producedduring the improvement of other layouts.
 66. The system of claim 65wherein the filter is generated by using a frequency-domain analysis ofan actual rate of change per improvement and defined rate of change perimprovement, the defined rate of change being calculated for eachimprovement based on knowledge of working set size of subsequentimprovements.
 67. The system of claim 65 wherein the filter is generatedby using a time-domain analysis of an actual rate of change perimprovement and defined rate of change per improvement, the defined rateof change being calculated for each improvement based on knowledge ofworking set size of subsequent improvements.
 68. A computer-readablemedium containing instructions for causing a computer system to converta finite impulse response filter into an infinite impulse responsefilter so that a desired tradeoff between confidence and mean lag isestablished, by: for each of a plurality of sets of autoregressivecoefficients, for the infinite impulse response filter calculatingresults of the infinite impulse response filter with the set ofautoregressive coefficients; and selecting the set of autoregressivecoefficients with a highest rating of the trade off between confidenceand mean lag for the infinite impulse response filter.
 69. Thecomputer-readable medium of claim 68 wherein the filters are fordetermining a rate of improvement in a working set optimization process.70. The computer-readable medium of claim 68 wherein the plurality ofsets are generated using Brent's method.
 71. The computer-readablemedium of claim 68 wherein the plurality of sets are generated usingPowell's method.